A Model for XML Instance Level Integration

نویسندگان

  • Aldo Monteiro do Nascimento
  • Carmem S. Hara
چکیده

There are two major problems for merging instances from different sources in order to build a datawarehouse: entity identification ambiguity and attribute value conflict. In this paper we propose a data model that facilitates the resolution of value attribute conflicts by explicitly representing them in the integrated schema. In this model, the datawarehouse is an XML tree populated with data imported from one or more XML sources, and nodes are annotated with provenance information. The purpose of annotations is twofold: first, they represent the origin of every element in the datawarehouse. This information is essential for determining the quality and amount of trust one places on the data. Second, they allow the portion of source XML tree used to populate the warehouse to be reconstructed. This capability is important if one needs the original document to compare with new releases from the same source in order to incrementally update the warehouse. Algorithms for populating the warehouse according to the proposed model and for reconstructing the source data are presented. We also report results from an experimental study conducted to determine the impact of the annotations on the size of the warehouse.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Schema Based XML Security: RBAC Approach

As a platform-independent solution, XML is going to be used in many environments such as application integration and Web Services. Security of XML instance is a basic problem, especially in enterprise with large number of users and XML objects as well as complex authorizations administration. In this paper, a role-based access control (RBAC) model based on XML Schema is proposed. RBAC has been ...

متن کامل

خوشه‌بندی فراابتکاری اسناد فارسی اِکس‌اِم‌اِل مبتنی بر شباهت ساختاری و محتوایی

Due to the increasing number of documents, XML, effectively organize these documents in order to retrieve useful information from them is essential. A possible solution is performed on the clustering of XML documents in order to discover knowledge. Clustering XML documents is a key issue of how to measure the similarity between XML documents. Conventional clustering of text documents using a do...

متن کامل

Instance Independent Concurrency Control for Semistructured Databases

Semistructured databases require tailor-made concurrency control mechanisms since traditional solutions for the relational model have been shown to be inadequate. Such mechanisms need to take full advantage of the hierarchical structure of semistructured data, for instance allowing concurrent updates of subtrees of, or even individual elements in XML documents. In earlier work, we presented two...

متن کامل

Management of XML data by means of schema matching

XML Schema Definition is a recommendation from World Wide Web Consortium that specifies the elements All, News, Get Started, Evaluate, Manage, Problem Solve Consider niche tech XQuery to bring improvements to data integration. Georg Gottlob MASTER THESIS Schema Matching and Automatic Web Data can mean any model, for instance, an XML schema, interface definition, semantic management it became a ...

متن کامل

Automatic Data Transformation - Breaching the Walled Gardens of Social Network Platforms

Although many social networks on the Web allow access via dedicated apis, the extraction of instance data for further use by applications is often a tedious task. As a result, instance data transformation to Linked Data in the form of owl, as well as the integration with other data sources, are aggravated. To alleviate these problems, this paper proposes a modeldriven approach to overcome data ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008